7 research outputs found
Versification and Authorship Attribution
The technique known as contemporary stylometry uses different methods, including machine learning, to discover a poemâs author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint stylometry tends to ignore: versification, or the very making of language into verse. Using poetic texts in three different languages (Czech, German, and Spanish), Petr PlechĂĄÄ asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. He then tests its findings on two unsolved literary mysteries. In the first, PlechĂĄÄ distinguishes the parts of the Elizabethan verse play The Two Noble Kinsmen written by William Shakespeare from those written by his coauthor, John Fletcher. In the second, he seeks to solve a case of suspected forgery: how authentic was a group of poems first published as the work of the nineteenth-century Russian author Gavriil Stepanovich Batenkov? This book of poetic investigation should appeal to literary sleuths the world over.illustrato
1800-luvun alun âvenĂ€lĂ€inen lauluâ korpustutkimuksen valossa
In this article, âRussian songsâ from the beginning of the 19th century â i.e. imitations or âstylisationsâ of non-ritual lyric Russian folksongs â are analysed using the methods of big data research. A corpus of âRussian songsâ is compared to corpora consisting of both folk songs and literary texts. The poetics of âRussian songsâ, surprisingly enough, do not resemble the folk songs they are supposed to be imitating, and comes more close to the literary norms of their time.Artikkeli kĂ€sittelee âvenĂ€lĂ€isiĂ€ laulujaâ, toisin sanoen rituaaleista irrallisen, lyyrisen venĂ€lĂ€isen kansanlaulun pastisseja. NĂ€mĂ€ laulelmat tai romanssit muistuttavat muodoltaan kansanlauluja, mutta ovat useimmiten yksittĂ€isenrunoilijan kĂ€sialaa. Pastissin ja jĂ€ljittelyn kohteen vĂ€listĂ€ suhdetta on huomattavasti vaikeampi kuvata teoreettisesti kuin kansanrunoudesta lainatun aineksen kĂ€yttöÀ kaunokirjallisuudessa yleensĂ€. Vastakkaisen suuntauksen tutkimus, eli tutkimus, jossa tarkastellaan kaunokirjallisten teosten adaptoitumista kansanrunoudeksi, on yleisellĂ€ tasolla auttanut ymmĂ€rtĂ€mÀÀn sanallisen kansanperinteen mekanismeja. TĂ€ssĂ€ artikkelissa lĂ€hestytÀÀn kuitenkin venĂ€lĂ€istĂ€ kirjallisuushistoriaa ja sen tyylivariaatioita korpusanalyysin keinoin. âVenĂ€lĂ€isistĂ€ lauluistaâ koottua tekstikorpusta verrataan sekĂ€ kansanrunouden ettĂ€ kaunokirjallisuuden teksteistĂ€ koottuihin korpuksiin.Tyylimetriikan menetelmien avulla pyritÀÀn kuvaamaan pelkistetty malli, jossa nĂ€kyvĂ€t korpusten vastaavuus ja erot. NĂ€in voidaan lĂ€hestyĂ€ kvantitatiivisesti kansanrunouden elementtien valikoitumisen ja vĂ€littymisen ongelmaapastisseissa. Analyysi osoittaa, ettĂ€ âvenĂ€lĂ€isten laulujenâ poetiikka ei muistuta imitoimiaan kansanlauluja, vaan on lĂ€hempĂ€nĂ€ aikansa yleisiĂ€ kaunokirjallisia normeja
Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets
The paper discusses an approach to decipher large collections of handwritten
index cards of historical dictionaries. Our study provides a working solution
that reads the cards, and links their lemmas to a searchable list of dictionary
entries, for a large historical dictionary entitled the Dictionary of the 17th-
and 18th-century Polish, which comprizes 2.8 million index cards. We apply a
tailored handwritten text recognition (HTR) solution that involves (1) an
optimized detection model; (2) a recognition model to decipher the handwritten
content, designed as a spatial transformer network (STN) followed by
convolutional neural network (RCNN) with a connectionist temporal
classification layer (CTC), trained using a synthetic set of 500,000 generated
Polish words of different length; (3) a post-processing step using constrained
Word Beam Search (WBC): the predictions were matched against a list of
dictionary entries known in advance. Our model achieved the accuracy of 0.881
on the word level, which outperforms the base RCNN model. Within this study we
produced a set of 20,000 manually annotated index cards that can be used for
future benchmarks and transfer learning HTR applications
Gyenge mƱfajok: a költĆi versmĂ©rtĂ©k Ă©s a jelentĂ©s közötti kapcsolat modellĂĄlĂĄsa az orosz költĂ©szetben
A dolgozat egy mĂĄr meglĂ©vĆ, âa versmĂ©rtĂ©k jelentĂ©smezĆjekĂ©ntâ ismert költĂ©szetelmĂ©let formalizĂĄlĂĄsĂĄt kĂsĂ©rli meg, amely elmĂ©let azt ĂĄllĂtja, hogy a modern lĂra kĂŒlönbözĆ metrikai formĂĄi bizonyos jelentĂ©sbeli asszociĂĄciĂłkat halmoznak fel Ă©s Ćriznek meg. Az LDA tĂ©mamodellezĆ (topic modelling) algoritmussal vizsgĂĄltuk az orosz költĂ©szet tĂĄg korpuszĂĄt (1750â1950), hogy ezĂĄltal minden egyes verset egy tĂ©matĂ©rben, a versmĂ©rtĂ©keket pedig a tĂ©mĂĄk valĂłszĂnƱsĂ©gĂ©nek eloszlĂĄsa szerint reprezentĂĄljunk. Nem felĂŒgyelt osztĂĄlyozĂĄst Ă©s kiterjedt mintavĂ©telt alkalmazva megmutatjuk, hogy a verselĂ©si formĂĄkon belĂŒl Ă©s között erĆs a forma Ă©s a jelentĂ©s kapcsolata: ugyanahhoz a versmĂ©rtĂ©khez tartozĂł kĂ©t minta sokszor nagyon is hasonlĂłkĂ©nt tƱnik fel, Ă©s ugyanannak a csalĂĄdnak kĂ©t verselĂ©si formĂĄja legtöbbször szintĂ©n egy klaszterbe kerĂŒl. Ez a kapcsolat akkor is kimutathatĂł, ha a korpusz kronolĂłgiai szempontbĂłl ellenĆrzött, Ă©s nem következmĂ©nye a populĂĄciĂł mĂ©retĂ©nek. Amellett Ă©rvelĂŒnk, hogy hasonlĂł megközelĂtĂ©st nyelvek Ă©s költĂ©szeti hagyomĂĄnyok szemantikai mezĆinek összehasonlĂtĂĄsakor is alkalmazni lehet, amelynek rĂ©vĂ©n az irodalomtörtĂ©net legalapvetĆbb kĂ©rdĂ©seire adhatĂłk relevĂĄns vĂĄlaszok
Semantics of European poetry is shaped by conservative forces: The relationship between poetic meter and meaning in accentual-syllabic verse
Recent advances in cultural analytics and large-scale computational studies
of art, literature and film often show that long-term change in the features of
artistic works happens gradually. These findings suggest that conservative
forces that shape creative domains might be underestimated. To this end, we
provide the first large-scale formal evidence of the persistent association
between poetic meter and semantics in 18-19th European literatures, using
Czech, German and Russian collections with additional data from English poetry
and early modern Dutch songs. Our study traces this association through a
series of clustering experiments using the abstracted semantic features of
150,000 poems. With the aid of topic modeling we infer semantic features for
individual poems. Texts were also lexically simplified across collections to
increase generalizability and decrease the sparseness of word frequency
distributions. Topics alone enable recognition of the meters in each observed
language, as may be seen from highly robust clustering of same-meter samples
(median Adjusted Rand Index between 0.48 and 1). In addition, this study shows
that the strength of the association between form and meaning tends to decrease
over time. This may reflect a shift in aesthetic conventions between the 18th
and 19th centuries as individual innovation was increasingly favored in
literature. Despite this decline, it remains possible to recognize semantics of
the meters from past or future, which suggests the continuity of semantic
traditions while also revealing the historical variability of conditions across
languages. This paper argues that distinct metrical forms, which are often
copied in a language over centuries, also maintain long-term semantic inertia
in poetry. Our findings, thus, highlight the role of the formal features of
cultural items in influencing the pace and shape of cultural evolution
Deep transitions: towards a comprehensive framework for mapping major continuities and ruptures in industrial modernity
The world is confronted by a socio-ecological emergency, requiring rapid and deep decarbonization of a broad range of socio-technical systems. A recent Deep Transitions framework argues that this fundamentally unsustainable trajectory has been generated by the co-evolutionary dynamics of multiple systems during the last 250 years. Altering this direction requires transformation in industrial modernity â a set of most fundamental ideas, institutions, and practices characterizing every industrial society to date. Although the proponents of the framework suggest that this shift has been unfolding since the 1960s, no attempts have been made to operationalize the concept of industrial modernity and to assess this claim. This paper develops a comprehensive multi-dimensional and multi-domain approach for the measurement of industrial modernity. As such it seeks to provide empirical evidence of long-term continuities and emerging ruptures in the dominant ideas, institutions, and practices of industrial societies along the domains of environment and technology. Using a methodologically novel approach where the text mining of newspapers is combined with data from various databases the paper provides results from three countries â Australia, Germany, Soviet Union/Russia â between 1900 and 2020. Despite considerable country-level differences the results show shifts in public environmental discourse from the 1960s, followed by institutional changes from the 1980s but with only a modest change in practices. We also observe some change in the direction of innovative activities and their regulation coupled with a resurgent optimism in technology-environment discourse. The findings tentatively suggest that industrial modernity might be in the process of hollowing out along ideational and institutional dimensions in the environmental domain but less so in the domain of technology and innovation
CLS Infra Computational Literary Studies Infrastructure
Computational Literary Studies Infrastructure, funded by the Horizon2020 grant scheme, is a four-year, pan-European project that aims to unify the diverse landscape of computational text analysis, in terms of available texts, tools, methods, practices and so forth, within its growing international user community. The project started out in February 2021, meaning that it has been underway for just over a year. In our poster we discuss the various deliverables and activities that have come out of the CLS INFRA project in its first quarter to give an idea of its impact in practice